An Emotion Speech Synthesis Method Based on VITS

نویسندگان

چکیده

People and things can be connected through the Internet of Things (IoT), speech synthesis is one key technologies. At this stage, end-to-end systems are capable synthesizing relatively realistic human voices, but current commonly used parallel text-to-speech suffers from loss useful information during two-stage delivery process, control features synthesized monotonous, with insufficient expression features, including emotion, leading to emotional becoming a challenging task. In paper, we propose new system named Emo-VITS, which based on highly expressive module VITS, realize emotion synthesis. We designed network extract global local reference audio, then fused feature fusion attention mechanism, so as achieve more accurate comprehensive The experimental results show that Emo-VITS system’s error rate went up little bit compared without emotionality does not affect semantic understanding. However, superior other networks in naturalness, sound quality, similarity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule-based Emotion Synthesis Using Concatenated Speech

Concatenative speech synthesis is increasing in popularity, as it offers higher quality output than previous formant synthesisers. However, it is based on recorded speech units, concatenative synthesis offers a lesser degree of parametric control during resynthesis. Consequently, adding pragmatic effects such as different speaking styles and emotions at the synthesis stage is fundamentally more...

متن کامل

An Emotion Based Speech Analysis

In a real world, when two human beings are having a conversation between them then they are able to identify the mental state of the speaker by hearing there voice (when speaking on a telephone) or both by seeing there facial expression as well as the way they speaking. Whereas, when a human being is having a conversation with a robot, then the robot is not able to understand the emotion of the...

متن کامل

HMM-Based Emotional Speech Synthesis Using Average Emotion Model

This paper presents a technique for synthesizing emotional speech based on an emotion-independent model which is called “average emotion” model. The average emotion model is trained using a multi-emotion speech database. Applying a MLLR-based model adaptation method, we can transform the average emotion model to present the target emotion which is not included in the training data. A multi-emot...

متن کامل

A corpus-based speech synthesis system with emotion

We propose a new approach to synthesizing emotional speech by a corpus-based concatenative speech synthesis system (ATR CHATR) using speech corpora of emotional speech. In this study, neither emotional-dependent prosody prediction nor signal processing per se is performed for emotional speech. Instead, a large speech corpus is created per emotion to synthesize speech with the appropriate emotio...

متن کامل

Speech Emotion Recognition Based on Sparse Representation

Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of domains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13042225